An Analytical Model-Based Auto-tuning Framework for Locality-Aware Loop Scheduling
نویسندگان
چکیده
HPC developers aim to deliver the very best performance. To do so they constantly think about memory bandwidth, memory hierarchy, locality, floating point performance, power/energy constraints and so on. On the other hand, application scientists aim to write performance portable code while exploiting the rich feature set of the hardware. By providing adequate hints to the compilers in the form of directives appropriate executable code is generated. There are tremendous benefits from using directive-based programming. However, applications are also becoming more and more complex and we need sophisticated tools such as auto-tuning to better explore the optimization space. In applications, loops typically form a major and time-consuming portion of the code. Scheduling these loops involves mapping from the loop iteration space to the underlying platform for example GPU threads. The user tries different scheduling techniques until the best one is identified. However, this process can be quite tedious and time consuming especially when it is a relatively large application, as the user needs to record the performance of every schedule’s run. This paper aims to offer a better solution by proposing an auto-tuning framework that adopts an analytical model guiding the compiler and the runtime to choose an appropriate schedule for the loops, automatically and determining the launch configuration for each of the loop schedules. Our experiments show that the predicted loop schedule by our framework achieves the speedup of 1.29x on an average against the default loop schedule chosen by the compiler.
منابع مشابه
Final Report : Compiler - Driven Performance Optimization and Tuning for Multicore Architectures Report Title
Final Report: Compiler-Driven Performance Optimization and Tuning for Multicore Architectures Report Title The widespread emergence of multicore processors as the computing engine in all commodity platforms presents our field with an enormous software development crisis. For over two decades, sequential software applications have enjoyed the free-ride of performance improvement with each new pr...
متن کاملA Hybrid Framework Bridging Locality Analysis and Cache-Aware Scheduling for CMPs
Industry is rapidly moving towards the adoption of Chip Multi-Processors (CMPs). The sharing of memory hierarchy becomes deeper and heterogeneous. Without a good understanding of the sharing, most current systems schedule processes in a contention-oblivious way, causing systems severely underutilized with sub-optimal throughput and cache thrashing. In this report, we propose a three-stage frame...
متن کاملAn ANOVA Based Analytical Dynamic Matrix Controller Tuning Procedure for FOPDT Models
Dynamic Matrix Control (DMC) is a widely used model predictive controller (MPC) in industrial plants. The successful implementation of DMC in practical applications requires a proper tuning of the controller. The available tuning procedures are mainly based on experience and empirical results. This paper develops an analytical tool for DMC tuning. It is based on the application of Analysis of V...
متن کاملNeural Network Assisted Tile Size Selection
Abstract. Data locality optimization plays a significant role in reducing the execution time of many loop-intensive kernels. Loop tiling at various levels is often used to effectively exploit data locality in deep memory hierarchies. The recent development of frameworks for parametric loop tiling of user code has lead to a widening of the range of applications that could benefit from auto-tunin...
متن کاملOn Parameterized Tiled Loop Generation and Its Parallelization
Tiling is a loop transformation that decomposes computations into a set of smaller computation blocks. The transformation has proved to be useful for many high-level program optimizations, such as data locality optimization and exploiting coarse-grained parallelism, and crucial for architecture with limited resources, such as embedded systems, GPUs, and the Cell. Data locality and parallelism w...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016